Guidelines for Multiple Testing in Impact Evaluations of Educational Interventions

نویسندگان

Peter Z Schochet

Amy Feldman

John Burghardt

چکیده

A. INTRODUCTION Studies that examine the impacts of education interventions on key student, teacher, and school outcomes typically collect data on large samples and on many outcomes. In analyzing these data, researchers typically conduct multiple hypothesis tests to address key impact evaluation questions. Tests are conducted to assess intervention effects for multiple outcomes, for multiple subgroups of schools or individuals, and sometimes across multiple treatment alternatives. In such instances, separate t-tests for each contrast are often performed to test the null hypothesis of no impacts, where the type I error rate (statistical significance level) is typically set at α = 5 percent for each test. This means that, for each test, the chance of erroneously finding a statistically significant impact is 5 percent. However, when the hypothesis tests are considered together, the " combined " type I error rate could be considerably larger than 5 percent. For example, if all null hypotheses are true, the chance of finding at least one spurious impact is 23 percent if 5 independent tests are conducted, 64 percent for 20 tests, and 92 percent for 50 tests (as discussed in more detail later in this report). Thus, without accounting for the multiple comparisons being conducted, users of the study findings may draw unwarranted conclusions. At the same time, statistical procedures that correct for multiple testing typically result in hypothesis tests with reduced statistical power—the probability of rejecting the null hypothesis given that it is false. Stated differently, these adjustment methods reduce the likelihood of identifying real differences between the contrasted groups. This is because controlling for multiple testing involves lowering the type I error rate for individual tests, with a resulting increase in the type II error rate. Simulation results presented later in this report show that if statistical power for an uncorrected individual test is 80 percent, the commonly-used Bonferroni adjustment procedure reduces statistical power to 59 percent if 5 tests are conducted, 41 percent for 20 tests, and 31 percent for 50 tests. Thus, multiplicity adjustment procedures can lead to substantial losses in statistical power. There is disagreement about the use of multiple testing procedures and the appropriate tradeoff between type I error and statistical power (type II error). Saville (1990) argues against multiplicity control to avoid statistical power losses, and that common sense and information from other sources should be used to protect against errors of interpretation. Cook and Farewell (1996) …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Technical Methods Report : Guidelines for Multiple Testing in Impact Evaluations

This report presents guidelines for addressing the multiple comparisons problem in impact evaluations in the education area. The problem occurs due to the large number of hypothesis tests that are typically conducted across outcomes and subgroups in these studies, which can lead to spurious statistically significant impact findings. The guidelines, which balance type I and type II errors, invol...

متن کامل

“Budget Impact Analyses”: a practical policy making tool for drug reimbursement decisions

Increasing accessibility and affordability of healthcare services has been considered as an important policy objective since the beginning of 1980s in Iran. However, current 60- 70% health care out-of-pocket payments create a barrier to an equal access to quality health services, especially in terms of new medicines which affects equity issues and "health" in Iran. Currently, health insurance o...

متن کامل

“Budget Impact Analyses”: a practical policy making tool for drug reimbursement decisions

متن کامل

Improving prenatal HIV screening with tailored educational interventions: an approach to guideline implementation.

BACKGROUND A healthy, uncomplicated pregnancy undergoes approximately 13 tests performed over an average of 12.5 prenatal visits. Published rates of compliance with routine prenatal testing are generally >90%, with lower rates for newer tests or those that require additional inputs prior to ordering. New CDC guidelines for prenatal HIV testing highlight the importance of prenatal testing and mo...

متن کامل

ارزیابی تجویز سفتریاکسون پیش و پس از معرفی راهنماهای درمانی پزشکان در یک بیمارستان آموزشی: گزارش کوتاه

Background: Judicious use of antibiotics is essential considering the growth of antimi-crobial resistance and escalating costs in health care. Ceftriaxone is a third-generation cephalosporin used widely for the treatment of various infections in outpatient and in-patient. The purpose of this study was to evaluate the ceftriaxone utilization before and after implementation of guidelines and phys...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Guidelines for Multiple Testing in Impact Evaluations of Educational Interventions

نویسندگان

چکیده

منابع مشابه

Technical Methods Report : Guidelines for Multiple Testing in Impact Evaluations

“Budget Impact Analyses”: a practical policy making tool for drug reimbursement decisions

“Budget Impact Analyses”: a practical policy making tool for drug reimbursement decisions

Improving prenatal HIV screening with tailored educational interventions: an approach to guideline implementation.

ارزیابی تجویز سفتریاکسون پیش و پس از معرفی راهنماهای درمانی پزشکان در یک بیمارستان آموزشی: گزارش کوتاه

عنوان ژورنال:

اشتراک گذاری